173 results found.
Language Type:
Multilingual
Languages:
Portuguese
Availability:
From Owner
License:
<Not Specified>
Size:
96331 sentences Production Status:
Existing-updated
Use:
Acquisition
-
Paper title:A corpus of European Portuguese child and child-directed speech
-
Paper track:Speech
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Ana Lúcia Santos | Universidade de Lisboa | PT |
| Author 2 | Michel Généreux | University of Lisbon | CA |
| Author 3 | Aida Cardoso | Universidade de Lisboa | PT |
| Author 4 | Celina Agostinho | Universidade de Lisboa | PT |
| Author 5 | Silvana Abalada | Universidade de Lisboa | PT |
| Main Contact | Ana Lúcia Santos | Universidade de Lisboa | None |
Documentation:
Yes, Documentation in English.
Multimodal/Multimedia
Corpus,
Language Type:
Bilingual
Languages:
English Portuguese
Availability:
Freely Available
License:
CreativeCommons
Size:
2000 hours Production Status:
Existing-used
Use:
Summarisation
-
Paper title:Multimodal Abstractive Summarization for How2 Videos
-
Paper track:Short/Vision, Robotics, Multimodal, Grounding and Speec
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Shruti Palaskar | How2 dataset | /N |
Documentation:
Yes
Written
Corpus,
Language Type:
Multilingual
Languages:
Afrikaans Albanian Amharic Arabic Aragonese Armenian Assamese Azerbaijani Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Catalan Central Khmer Chinese Croatian Czech Danish Dutch Dzongkha English Esperanto Estonian Finnish French Gaelic Galician Georgian German Greek Gujarati Hausa Hebrew Hindi Hungarian Icelandic Igbo Indonesian Irish Italian Japanese Kannada Kazakh Kinyarwanda Korean Kurdish Kyrgyz Latvian Limburgan Lithuanian Macedonian Malagasy Malay Malayalam Maltese Marathi Mongolian Nepali Northern Sami Norwegian Norwegian Bokmål Norwegian Nynorsk Occitan Oriya Panjabi Pashto Persian Polish Portuguese Romanian Russian Serbian Serbo-Croatian Sinhala Slovak Slovenian Spanish Swedish Tajik Tamil Tatar Telugu Thai Turkish Turkmen Uighur Ukrainian Urdu Uzbek Vietnamese Walloon Welsh Western Frisian Xhosa Yiddish Yoruba Zulu
Availability:
Freely Available
License:
Size:
55 million sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
-
Paper track:Long/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Biao Zhang | the open parallel corpus (OPUS) | /N |
Documentation:
None
Not Applicable
Contextualsed word embeddings,
Language Type:
Monolingual
Languages:
Ancient Arabic Basque Bokmål Bulgarian Catalan Chinese Church Croatian Czech Danish Dutch English Estonian Finnish French Galician German Greek Hebrew Hindi Hungarian Indonesian Irish Italian Japanese Korean Latin Latvian Norwegian Nynorsk Old Persian Polish Portuguese Romanian Russian Simplified Chinese Slavonic Slovak Slovene Spanish Swedish Turkish Ukrainian Urdu Uyghur Vietnamese
Availability:
Freely Available
License:
none
Size:
18.4 GByte Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:Treebank Embedding Vectors for Out-of-domain Dependency Parsing
-
Paper track:Short/Syntax: Tagging, Chunking and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Joachim Wagner | Elmo For Many Languages | /N |
Documentation:
https://www.aclweb.org/anthology/K18-2005/
Written
Treebank,
Language Type:
Monolingual
Languages:
Dutch German Italian Norwegian Portuguese
Availability:
Freely Available
License:
CreativeCommons
Size:
None Production Status:
Use:
Parsing and Tagging
-
Paper title:Extracting Headless MWEs from Dependency Parse Trees: Parsing, Tagging, and Joint Modeling Approaches
-
Paper track:Long/Syntax: Tagging, Chunking and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Tianze Shi | Universal Dependencies 2.2 | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Catalan French Italian Portuguese
Availability:
Freely Available
License:
GNU
Size:
61.9 hours Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Pretraining by Backtranslation for End-to-end ASR in Low-Resource Settings
-
Paper track:8.5 Novel neural network architectures (e.g. seque/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Matthew Wiesner | VoxForge French, Italian, Portuguese, and Catalan Subsets | /N |
Documentation:
Yes. It can be found at http://www.voxforge.org/
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
English French German Portuguese Romanian Russian Spanish
Availability:
Freely Available
License:
CreativeCommons
Size:
500 hours Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Adapting Transformer to End-to-End Spoken Language Translation
-
Paper track:12.1 Spoken machine translation/Oral Presentation
-
Paper status:Accept - Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mattia A. Di Gangi | MuST-C | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Basque French German Italian Portuguese Spanish
Availability:
From Data Center(s)
License:
META-SHARE and/or CC
Size:
1040 hours Production Status:
Newly created-in progress
Use:
Speech Recognition/Understanding
-
Paper title:Recognition of Latin American Spanish using Multi-task Learning
-
Paper track:8.12 Cross-lingual and multilingual/accent aspects/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Carlos Mendes | SAVAS META-SHARE repository | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Amharic Bosnian Croatian Dari English French Georgian Haitian Hausa Hindi Korean Mandarin Chinese Persian Portuguese Pushto Russian Spanish Turkish Ukrainian Urdu Vietnamese Yue Chinese
Availability:
From Owner
License:
LDC
Size:
215 hours Production Status:
Existing-used
Use:
Language Identification
-
Paper title:Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2009 NIST Language Recognition Evaluation Test Set | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Dutch English French German Italian Portuguese Romanian Spanish
Availability:
Freely Available
License:
Creative Commons Attribution-NonCommercial-NoDerivs 4.0 License
Size:
None Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:MuST-Cinema: a Speech-to-Subtitles corpus
-
Paper track:Multimodality/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Alina Karakanta | MuSt-Cinema | /N |
Documentation:
Documentation publicly available in English




